To accomplish the goal, you need to train an RL agent that uses trial and error ...
Read writing from Renu Khandelwal on Medium. Loves learning, sharing, and discovering myself. Passio..
You have 2 free member-only stories left this month.
Why is monitoring system resources important?
If you cannot measure it, you cannot improve it- Lord Kelvin
Monitoring helps to regularly evaluate the performance of the critical system resources like
Monitoring is critical in identifying the process that is utilizing the most resources and why. It helps to understand if the current system's resources are sufficient for running or a rogue process is consuming too many resources.
Having a limiting threshold on the systems resources will prevent further escalation of the issues and identify an appropriate root cause analysis to fix the issue.
Here we will explore
psutil(process and system utilities) library in Python is a cross-platform library for retrieving information on running processes and system utilization for resources like CPU, memory, disks, network, sensors.
psutil currently supports the following platforms:
GPUtil is a Python module for getting the GPU status from NVIDIA GPUs using nvidia-smi.
Profile your system to know the system name, OS version, if the system is a 64-bit architecture or 32-bit architecture, number of physical and virtual cores, and the max and min frequency of the CPU
Platform library retrieves platform-identifying data like device name, OS version, OS release version, node, processor, etc.
import psutil
import platform
uname = platform.uname()
print(f"System: {uname.system}") #Windows or Linux
print(f"Node Name: {uname.node}") # System name
print(f"Release: {uname.release}") # OS release version like 10(Windows) or 5.4.0-72-generic(linux)
print(f"Version: {uname.version}")
print(f"Machine: {uname.machine}") # machine can be AMD64 or x86-64
print(f"Processor: {uname.processor}") # Intel64 Family 6 or x86_64
print("Physical cores:", psutil.cpu_count(logical=False))
print("Total cores:", psutil.cpu_count(logical=True))
# CPU frequencies
cpufreq = psutil.cpu_freq()
print(f"Max Frequency: {cpufreq.max:.2f}Mhz")
print(f"Min Frequency: {cpufreq.min:.2f}Mhz")
print(f"Current Frequency: {cpufreq.current:.2f}Mhz")Monitoring the temperatures of the different physical cores and if the current temperature of any of the physical cores is above the threshold limit for the core, then alert
psutil.sensors_temperatures(fahrenheit=True) provides us with the current, high, and critical temperatures of the different physical cores. This does not apply to Windows
for i in range(len(psutil.sensors_temperatures(fahrenheit=True)[ 'coretemp'])):
print(str(psutil.sensors_temperatures(fahrenheit=True)[ 'coretemp'][i].label) + " has a temp of " + str(psutil.sensors_temperatures(fahrenheit=True)[ 'coretemp'][i].current) + "F")
if psutil.sensors_temperatures(fahrenheit=True)[ 'coretemp'][i].current >psutil.sensors_temperatures(fahrenheit=True)[ 'coretemp'][i].high:
print("Temp too high")Virtual memory is a combination of RAM and the disk space that all the processes running on the CPU use, while Swap space is the portion of virtual memory on the hard disk used by the running processes when the RAM is full.
def get_size(bytes, suffix="B"):
"""
Scale bytes to its proper format- KB, MB, GB, TB and PB
"""
factor = 1024
for unit in ["", "K", "M", "G", "T", "P"]:
if bytes < factor:
return f"{bytes:.2f}{unit}{suffix}"
bytes /= factorprint("Virtual memory")
svmem = psutil.virtual_memory()
print(f"Total: {get_size(svmem.total)}")
print(f"Available: {get_size(svmem.available)}")
print(f"Used: {get_size(svmem.used)}")
print(f"Percentage: {svmem.percent}%")
print("SWAP memory")
# get the swap memory details (if exists)
swap = psutil.swap_memory()
print(f"Total: {get_size(swap.total)}")
print(f"Free: {get_size(swap.free)}")
print(f"Used: {get_size(swap.used)}")
print(f"Percentage: {swap.percent}%")
psutil.virtual_memory(): returns statistics about system memory usage as a named tuple.
psutil.swap_memory(): provides details of swap memory statistics as a tuple.
VIRTUAL_MEMEORY_THRESHOLD = 100 * 1024 * 1024 # 100MB
SWAP_MEMEORY_THRESHOLD = 45
if psutil.virtual_memory().available <= THRESHOLD:
print("Low Virtual Memory warning")
if psutil.swap_memory().percent>=SWAP_MEMEORY_THRESHOLD:
print("Low Swap Memory warning")psutil.disk_partitions(): returns all mounted disk partitions including device, mount point, and filesystem type
print( "Hard Disk Information")
print("Partitions and Usage:")
# get all disk partitions on the device
partitions = psutil.disk_partitions()
for partition in partitions:
print("Device:",partition.device)
print("Partition Mountpoint: ",partition.mountpoint)
print("Partition File system type",partition.fstype)
try:
partition_usage = psutil.disk_usage(partition.mountpoint)
except PermissionError:
continue
print("Total Size: ", get_size(partition_usage.total))
print("Used Space: ", get_size(partition_usage.used))
print("Free hard disk Space", get_size(partition_usage.free))
print("Hard disk Used Percentage: ", partition_usage.percent, "%")
if(partition_usage.percent >82):
print("Disk space nearing full")All network protocols are associated with a specific address family. An address family provides services like packet fragmentation and reassembly, routing, addressing, and transporting. The address family provides interprocess communication between processes that run on the same system or different systems.
An address family is normally comprised of several protocols, one per socket type.
Different networks address families and their purpose
print( "Network Information")
# get all network interfaces (virtual and physical)
if_addrs = psutil.net_if_addrs()
for interface_name, interface_addresses in if_addrs.items():
for address in interface_addresses:
print(" Interface: ", interface_name)
if str(address.family) == 'AddressFamily.AF_INET':
print(" IP Address: ", address.address)
print(" Netmask: ", address.netmask)
print(" Broadcast IPv4: ",address.broadcast)
elif str(address.family) == 'AddressFamily.AF_PACKET':
print(" MAC Address: {address.address}")
print(" Netmask: {address.netmask}")
print(" Broadcast MAC: {address.broadcast}")
elif str(address.family) == 'AddressFamily.AF_INET6':
print(" IP Address: ", address.address)
print(" Netmask: ", address.netmask)
print(" Broadcast IPv6: ",address.broadcast)psutil.net_io_counters(): Return system-wide network I/O statistics like bytes sent, bytes received, incoming packets that were dropped, or outgoing packets dropped
net_io = psutil.net_io_counters()
print("Total Bytes Sent: ", get_size(net_io.bytes_sent))
print("Total Bytes Received: ", get_size(net_io.bytes_recv))
print("Total outgoing packets dropped: ", net_io.dropin)
print("Total incoming packets dropped:", net_io.dropout)
print("Total outgoing errors: ", net_io.errout)
print("Total incoming errors:", net_io.errin)GPUtil is a Python library for getting the GPU status from NVIDIA GPUs. It displays all the NVIDIA GPUs available on the device, free memory available, memory used, and the GPU temperature in Centigrade.
import GPUtil
Gpus = GPUtil.getGPUs()
gpulist=[]
for gpu in Gpus:
print(gpu.name)
print('gpu.id:', gpu.id)
print ( 'total GPU:', gpu.memoryTotal)
print(f"Memory free {gpu.memoryFree}MB")
print ( 'GPU usage:', gpu.memoryUsed)
print ( 'gpu use proportion:', gpu.memoryUtil * 100)
print(str(gpu.temperature) + " C")
gpulist.append([ gpu.id, gpu.memoryTotal, gpu.memoryUsed,gpu.memoryUtil * 100])
THRESHOLD_GPU=10
for gpu in Gpus:
print(gpu.name,' gpu.id:', gpu.id)
if gpu.memoryTotal/gpu.memoryUsed*100>THRESHOLD_GPU:print ( f"gpu memory usgae currently is: {gpu.memoryUtil * 100}% which exceeds the threshold of {THRESHOLD_GPU}%" )
To monitor IoT Sensors, we can retrieve the hardware temperature, fans speed, and battery info. Identify the threshold and raise an error if the threshold is reached.
psutil.sensors_temperatures()
psutil.sensors_fans()
psutil.sensors_battery()GPUtil and psutil are cross-platform python libraries for retrieving information on running processes and system utilization for CPU, memory, disks, network, sensors. These are very helpful to monitor the performance of the critical system resources.
This is a paid medium article, you have to be a paid medium subscriber.. 😔